source file
A Multi-Agent Framework for Stateful Inference-Time Search
Lalan, Arshika, Ghosh, Rajat, Kolsur, Aditya, Dutta, Debojyoti
Recent work explores agentic inference-time techniques to perform structured, multi-step reasoning. However, stateless inference often struggles on multi-step tasks due to the absence of persistent state. Moreover, task-specific fine-tuning or instruction-tuning often achieve surface-level code generation but remain brittle on tasks requiring deeper reasoning and long-horizon dependencies. To address these limitations, we propose stateful multi-agent evolutionary search, a training-free framework that departs from prior stateless approaches by combining (i) persistent inference-time state, (ii) adversarial mutation, and (iii) evolutionary preservation. We demonstrate its effectiveness in automated unit test generation through the generation of edge cases. We generate robust edge cases using an evolutionary search process, where specialized agents sequentially propose, mutate, and score candidates. A controller maintains persistent state across generations, while evolutionary preservation ensures diversity and exploration across all possible cases. This yields a generalist agent capable of discovering robust, high-coverage edge cases across unseen codebases. Experiments show our stateful multi-agent inference framework achieves substantial gains in coverage over stateless single-step baselines, evaluated on prevalent unit-testing benchmarks such as HumanEval and TestGenEvalMini and using three diverse LLM families - Llama, Gemma, and GPT. These results indicate that combining persistent inference-time state with evolutionary search materially improves unit-test generation.
You Have Been LaTeXpOsEd: A Systematic Analysis of Information Leakage in Preprint Archives Using Large Language Models
Dubniczky, Richard A., Borsos, Bertalan, Norbert, Tihanyi
The widespread use of preprint repositories such as arXiv has accelerated the communication of scientific results but also introduced overlooked security risks. Beyond PDFs, these platforms provide unrestricted access to original source materials, including LaTeX sources, auxiliary code, figures, and embedded comments. In the absence of sanitization, submissions may disclose sensitive information that adversaries can harvest using open-source intelligence. In this work, we present the first large-scale security audit of preprint archives, analyzing more than 1.2 TB of source data from 100,000 arXiv submissions. We introduce LaTeXpOsEd, a four-stage framework that integrates pattern matching, logical filtering, traditional harvesting techniques, and large language models (LLMs) to uncover hidden disclosures within non-referenced files and LaTeX comments. To evaluate LLMs' secret-detection capabilities, we introduce LLMSec-DB, a benchmark on which we tested 25 state-of-the-art models. Our analysis uncovered thousands of PII leaks, GPS-tagged EXIF files, publicly available Google Drive and Dropbox folders, editable private SharePoint links, exposed GitHub and Google credentials, and cloud API keys. We also uncovered confidential author communications, internal disagreements, and conference submission credentials, exposing information that poses serious reputational risks to both researchers and institutions. We urge the research community and repository operators to take immediate action to close these hidden security gaps. To support open science, we release all scripts and methods from this study but withhold sensitive findings that could be misused, in line with ethical principles. The source code and related material are available at the project website https://github.com/LaTeXpOsEd
- North America > United States (0.04)
- Asia > Middle East > UAE > Abu Dhabi Emirate > Abu Dhabi (0.04)
- Asia > China (0.04)
- (5 more...)
Can Constructions "SCAN" Compositionality ?
Katrapati, Ganesh, Shrivastava, Manish
Sequence to Sequence models struggle at compositionality and systematic generalisation even while they excel at many other tasks. We attribute this limitation to their failure to internalise constructions conventionalised form meaning pairings that license productive recombination. Building on these insights, we introduce an unsupervised procedure for mining pseudo-constructions: variable-slot templates automatically extracted from training data. When applied to the SCAN dataset, our method yields large gains out-of-distribution splits: accuracy rises to 47.8 %on ADD JUMP and to 20.3% on AROUND RIGHT without any architectural changes or additional supervision. The model also attains competitive performance with? 40% of the original training data, demonstrating strong data efAciency. Our findings highlight the promise of construction-aware preprocessing as an alternative to heavy architectural or training-regime interventions.
- Europe > United Kingdom > England > Oxfordshire > Oxford (0.14)
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- North America > United States > Illinois > Cook County > Chicago (0.04)
- (6 more...)
NeuRaLaTeX: A machine learning library written in pure LaTeX
Gardner, James A. D., Rowan, Will, Smith, William A. P.
In this paper, we introduce NeuRaLaTeX, which we believe to be the first deep learning library written entirely in LaTeX. As part of your LaTeX document you can specify the architecture of a neural network and its loss functions, define how to generate or load training data, and specify training hyperparameters and experiments. When the document is compiled, the LaTeX compiler will generate or load training data, train the network, run experiments, and generate figures. This paper generates a random 100 point spiral dataset, trains a two layer MLP on it, evaluates on a different random spiral dataset, produces plots and tables of results. The paper took 48 hours to compile and the entire source code for NeuRaLaTeX is contained within the source code of the paper. We propose two new metrics: the Written In Latex (WIL) metric measures the proportion of a machine learning library that is written in pure LaTeX, while the Source Code Of Method in Source Code of Paper (SCOMISCOP) metric measures the proportion of a paper's implementation that is contained within the paper source. We are state-of-the-art for both metrics, outperforming the ResNet and Transformer papers, as well as the PyTorch and Tensorflow libraries. Source code, documentation, videos, crypto scams and an invitation to invest in the commercialisation of NeuRaLaTeX are available at https://www.neuralatex.com
LLM Test Generation via Iterative Hybrid Program Analysis
Gu, Sijia, Nashid, Noor, Mesbah, Ali
Automating unit test generation remains a significant challenge, particularly for complex methods in real-world projects. While Large Language Models (LLMs) have made strides in code generation, they struggle to achieve high branch coverage due to their limited ability to reason about intricate control flow structures. To address this limitation, we introduce Panta, a technique that emulates the iterative process human developers follow when analyzing code and constructing test cases. Panta integrates static control flow analysis and dynamic code coverage analysis to systematically guide LLMs in identifying uncovered execution paths and generating better test cases. By incorporating an iterative feedback-driven mechanism, our technique continuously refines test generation based on static and dynamic path coverage insights, ensuring more comprehensive and effective testing. Our empirical evaluation, conducted on classes with high cyclomatic complexity from open-source projects, demonstrates that Panta achieves 26% higher line coverage and 23% higher branch coverage compared to the state-of-the-art.
- Research Report > New Finding (0.46)
- Research Report > Experimental Study (0.46)
Overcoming Vision Language Model Challenges in Diagram Understanding: A Proof-of-Concept with XML-Driven Large Language Models Solutions
Shiinoki, Shue, Koshihara, Ryo, Motegi, Hayato, Morishige, Masumi
Diagrams play a crucial role in visually conveying complex relationships and processes within business documentation. Despite recent advances in Vision-Language Models (VLMs) for various image understanding tasks, accurately identifying and extracting the structures and relationships depicted in diagrams continues to pose significant challenges. This study addresses these challenges by proposing a text-driven approach that bypasses reliance on VLMs' visual recognition capabilities. Instead, it utilizes the editable source files--such as xlsx, pptx or docx--where diagram elements (e.g., shapes, lines, annotations) are preserved as textual metadata. In our proof-of-concept, we extracted diagram information from xlsx-based system design documents and transformed the extracted shape data into textual input for Large Language Models (LLMs). This approach allowed the LLM to analyze relationships and generate responses to business-oriented questions without the bottleneck of image-based processing. Experimental comparisons with a VLM-based method demonstrated that the proposed text-driven framework yielded more accurate answers for questions requiring detailed comprehension of diagram structures.The results obtained in this study are not limited to the tested .xlsx files but can also be extended to diagrams in other documents with source files, such as Office pptx and docx formats. These findings highlight the feasibility of circumventing VLM constraints through direct textual extraction from original source files. By enabling robust diagram understanding through LLMs, our method offers a promising path toward enhanced workflow efficiency and information analysis in real-world business scenarios.
- Asia > Japan (0.04)
- North America > United States > Florida > Miami-Dade County > Miami (0.04)
- Europe > Switzerland (0.04)
- (3 more...)
SkillScope: A Tool to Predict Fine-Grained Skills Needed to Solve Issues on GitHub
Carter, Benjamin C., Contreras, Jonathan Rivas, Villegas, Carlos A. Llanes, Acharya, Pawan, Utzerath, Jack, Farner, Adonijah O., Jenkins, Hunter, Johnson, Dylan, Penney, Jacob, Steinmacher, Igor, Gerosa, Marco A., Santos, Fabio
New contributors often struggle to find tasks that they can tackle when onboarding onto a new Open Source Software (OSS) project. One reason for this difficulty is that issue trackers lack explanations about the knowledge or skills needed to complete a given task successfully. These explanations can be complex and time-consuming to produce. Past research has partially addressed this problem by labeling issues with issue types, issue difficulty level, and issue skills. However, current approaches are limited to a small set of labels and lack in-depth details about their semantics, which may not sufficiently help contributors identify suitable issues. To surmount this limitation, this paper explores large language models (LLMs) and Random Forest (RF) to predict the multilevel skills required to solve the open issues. We introduce a novel tool, SkillScope, which retrieves current issues from Java projects hosted on GitHub and predicts the multilevel programming skills required to resolve these issues. In a case study, we demonstrate that SkillScope could predict 217 multilevel skills for tasks with 91% precision, 88% recall, and 89% F-measure on average. Practitioners can use this tool to better delegate or choose tasks to solve in OSS projects.
- North America > United States > Colorado (0.04)
- North America > United States > Arizona (0.04)
- Europe > Spain > Galicia > Madrid (0.04)
Defect Prediction with Content-based Features
Pham, Hung Viet, Nguyen, Tung Thanh
Traditional defect prediction approaches often use metrics that measure the complexity of the design or implementing code of a software system, such as the number of lines of code in a source file. In this paper, we explore a different approach based on content of source code. Our key assumption is that source code of a software system contains information about its technical aspects and those aspects might have different levels of defect-proneness. Thus, content-based features such as words, topics, data types, and package names extracted from a source code file could be used to predict its defects. We have performed an extensive empirical evaluation and found that: i) such content-based features have higher predictive power than code complexity metrics and ii) the use of feature selection, reduction, and combination further improves the prediction performance.
- North America > United States > Texas > Brazos County > College Station (0.04)
- North America > Canada (0.04)
- Asia > Middle East > Jordan (0.04)
- Research Report > Experimental Study (0.68)
- Research Report > New Finding (0.46)
Explaining the Contributing Factors for Vulnerability Detection in Machine Learning
Mouine, Esma, Liu, Yan, Xiao, Lu, Kazman, Rick, Wang, Xiao
There is an increasing trend to mine vulnerabilities from software repositories and use machine learning techniques to automatically detect software vulnerabilities. A fundamental but unresolved research question is: how do different factors in the mining and learning process impact the accuracy of identifying vulnerabilities in software projects of varying characteristics? Substantial research has been dedicated in this area, including source code static analysis, software repository mining, and NLP-based machine learning. However, practitioners lack experience regarding the key factors for building a baseline model of the state-of-the-art. In addition, there lacks of experience regarding the transferability of the vulnerability signatures from project to project. This study investigates how the combination of different vulnerability features and three representative machine learning models impact the accuracy of vulnerability detection in 17 real-world projects. We examine two types of vulnerability representations: 1) code features extracted through NLP with varying tokenization strategies and three different embedding techniques (bag-of-words, word2vec, and fastText) and 2) a set of eight architectural metrics that capture the abstract design of the software systems. The three machine learning algorithms include a random forest model, a support vector machines model, and a residual neural network model. The analysis shows a recommended baseline model with signatures extracted through bag-of-words embedding, combined with the random forest, consistently increases the detection accuracy by about 4% compared to other combinations in all 17 projects. Furthermore, we observe the limitation of transferring vulnerability signatures across domains based on our experiments.
- North America > United States > New York > New York County > New York City (0.04)
- North America > United States > New Jersey > Hudson County > Hoboken (0.04)
- North America > United States > Hawaii > Honolulu County > Honolulu (0.04)
- (4 more...)
- Research Report > New Finding (1.00)
- Research Report > Experimental Study > Negative Result (0.66)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.93)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (0.68)
Generalizable Long-Horizon Manipulations with Large Language Models
Zhou, Haoyu, Ding, Mingyu, Peng, Weikun, Tomizuka, Masayoshi, Shao, Lin, Gan, Chuang
This work introduces a framework harnessing the capabilities of Large Language Models (LLMs) to generate primitive task conditions for generalizable long-horizon manipulations with novel objects and unseen tasks. These task conditions serve as guides for the generation and adjustment of Dynamic Movement Primitives (DMP) trajectories for long-horizon task execution. We further create a challenging robotic manipulation task suite based on Pybullet for long-horizon task evaluation. Extensive experiments in both simulated and real-world environments demonstrate the effectiveness of our framework on both familiar tasks involving new objects and novel but related tasks, highlighting the potential of LLMs in enhancing robotic system versatility and adaptability. Project website: https://object814.github.io/Task-Condition-With-LLM/
- North America > United States > Ohio > Montgomery County > Dayton (0.05)
- Europe > Netherlands (0.05)